
Marine Heatwave Detection#
Marine heatwaves are periods of persistent anomalously warm ocean temperatures, which can have significant impacts on marine life as well as coastal communities and economies. To detect the warm ocean water, sea surface temperature (SST) is usually used to define if there is any marine heatwave event. The following example is following the paper Jacox et al., 2022
Overview#
In this page/notebook, we will be go throught the following steps
Extract the data from the PSL OPeNDAP server
Calculate the SST climatology
Calculate the SST anomaly
Determine the SST threshold based on the anomaly
Identify the marine heatwaves based on threshold
Prerequisites#
To better understand and follow the steps in the notebook, it will be helpful for user to go through
Concepts |
Importance |
Notes |
|---|---|---|
Helpful |
Chunking and OPeNDAP access |
Time to learn: 15 minutes.
System requirements:
python
Xarray
pydap (not imported but will be used in the Xarray backend)
matplotlib (not imported but will be used in the Xarray plotting)
Numpy
plotly (only for the final interactive plot)
Imports#
# import the needed packages
import warnings
import xarray as xr
import numpy as np
import plotly.express as px
warnings.simplefilter("ignore")
warnings.simplefilter
This line of code is not affecting the execution but just removing some of the warning output that might clutter your notebook. However, do pay attention to some of the warnings since they will indicate some deprecation of function or arg/kwarg in future update.Extract the data from an OPeNDAP server#
In this page/notebook, we demonstrate how to use the NOAA OISST v2 High-resolution dataset to detect marine heatwaves. The dataset is currently hosted by NOAA Physical Sciences Laboratory.
Info
To explore more gridded datasets that are hosted at NOAA PSL, here is a useful search toolopendap_mon_url = "https://psl.noaa.gov/thredds/dodsC/Datasets/noaa.oisst.v2.highres/sst.mon.mean.nc"
Xarray getting remote data#
Xarray has a great support on accessing data in the cloud.
It has been continue to expend its capability and functionality with the community discussion like this.
Here we use the xr.open_dataset method with the keyword argument (engine='pydap') to use the pydap package in the backend to access the OPeNDAP server.
ds_mon = xr.open_dataset(opendap_mon_url, engine='pydap', chunks={'time':12,'lon':-1,'lat':-1})
Lazy Loading
We can load the data lazily (only loading the metadata and coordinates information) and peek at the data's dimension and availability on our local machine. The actual data (SST values at each grid point in this case) will only be downloaded from the PSL server when further data manipulation (subsetting and aggregation like calculating mean) is needed. The only thing user needs to do to activate this function is to read the netCDF file using the `xr.open_dataset()` method with the keyword argument `chunks={'time':12,'lon':-1,'lat':-1}` provided. The chunk reading approach provide the opportunity to reduce the memory usage on the local machine during the calculation, the possibility of parallelizing the processes, and side-stepping the data download limit set by the OPeNDAP server (PSL server has a 500MB limit). The dataset is loaded lazily (only metadata and coordinates) shown below.In our example, we set the size of each chunk to be 12(time)x1440(lon)x720(lat) (when setting the chunk size = -1, it will use the length of the dimension as the chunksize) which is equal to 47.46 MB of data while the entire dataset is 1.39 GB. This allows us to get data in 47.46 MB chunk per download request.
The dataset is loaded lazily (only metadata and coordinates) shown below.
ds_mon
<xarray.Dataset> Size: 2GB
Dimensions: (time: 525, lat: 720, lon: 1440)
Coordinates:
* time (time) datetime64[ns] 4kB 1981-09-01 1981-10-01 ... 2025-05-01
* lat (lat) float32 3kB -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
* lon (lon) float32 6kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9
Data variables:
sst (time, lat, lon) float32 2GB dask.array<chunksize=(12, 720, 1440), meta=np.ndarray>
Attributes:
Conventions: CF-1.5
title: NOAA/NCEI 1/4 Degree Daily Optimum Interpolation Se...
institution: NOAA/National Centers for Environmental Information
source: NOAA/NCEI https://www.ncei.noaa.gov/data/sea-surfac...
References: https://www.psl.noaa.gov/data/gridded/data.noaa.ois...
dataset_title: NOAA Daily Optimum Interpolation Sea Surface Temper...
version: Version 2.1
comment: Reynolds, et al.(2007) Daily High-Resolution-Blende...
_NCProperties: version=2,netcdf=4.7.0,hdf5=1.10.5,
Unlimited_Dimension: timeCalculate the SST climatology#
First, we need to define the period that we are going to use to calculate the climatology. Here, we picked the 2019-2020 period to calculate the climatology.
Climatology
For a more accurate and scientifically valid estimate of marine heatwaves, one should usually consider a climatology period of at least 30 years. Here we set the climatology period from 2019 to 2020 (2 years) to speed up the processing time and for demonstration only. The shorter period (less memory consumption) also makes the interactive notebook launch on this page available for the user to manipulate and play with the dataset.climo_start_yr = 2019 # determine the climatology/linear trend start year
climo_end_yr = 2020 # determine the climatology/linear trend end year
ds_mon_crop = ds_mon.where((ds_mon['time.year']>=climo_start_yr)&
(ds_mon['time.year']<=climo_end_yr),drop=True)
ds_mon_crop
<xarray.Dataset> Size: 100MB
Dimensions: (time: 24, lat: 720, lon: 1440)
Coordinates:
* time (time) datetime64[ns] 192B 2019-01-01 2019-02-01 ... 2020-12-01
* lat (lat) float32 3kB -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
* lon (lon) float32 6kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9
Data variables:
sst (time, lat, lon) float32 100MB dask.array<chunksize=(11, 720, 1440), meta=np.ndarray>
Attributes:
Conventions: CF-1.5
title: NOAA/NCEI 1/4 Degree Daily Optimum Interpolation Se...
institution: NOAA/National Centers for Environmental Information
source: NOAA/NCEI https://www.ncei.noaa.gov/data/sea-surfac...
References: https://www.psl.noaa.gov/data/gridded/data.noaa.ois...
dataset_title: NOAA Daily Optimum Interpolation Sea Surface Temper...
version: Version 2.1
comment: Reynolds, et al.(2007) Daily High-Resolution-Blende...
_NCProperties: version=2,netcdf=4.7.0,hdf5=1.10.5,
Unlimited_Dimension: timeTo calculate the SST monthly climatology, we utilize the groupby method from Xarray.
ds_mon_climo = ds_mon_crop.groupby('time.month').mean()
Calculate the SST anomaly#
After the climatology is determined, we subtract the climatology from the original data to get the anomaly.
ds_mon_anom = (ds_mon_crop.groupby('time.month')-ds_mon_climo).compute()
.compute()
Notice the `.compute()` method in the code above. The data of SST is only loaded chunk-by-chunk, cropped to the desired period, averaged in the group of months, and finally subtracted the climatology from the original data when we execute the `.compute()` line. All these tasks are now executed in the background with a distributed server assigning tasks to different CPUs.ds_mon_anom
<xarray.Dataset> Size: 100MB
Dimensions: (time: 24, lat: 720, lon: 1440)
Coordinates:
* time (time) datetime64[ns] 192B 2019-01-01 2019-02-01 ... 2020-12-01
* lat (lat) float32 3kB -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
* lon (lon) float32 6kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9
month (time) int64 192B 1 2 3 4 5 6 7 8 9 10 ... 3 4 5 6 7 8 9 10 11 12
Data variables:
sst (time, lat, lon) float32 100MB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0Determine the SST threshold based on the anomaly#
Based on the Jacox et al., 2022, the threshold is determined based on a three month window with the center month being the monthly threhold one need to determined (e.g. January threshold is determined by all December, January, Feburary SST anomalies). Therefore, the function below is written to perform the three months window percentile operation.
########## Functions #########
# Function to calculate the 3 month rolling Quantile
def mj_3mon_quantile(da_data, mhw_threshold=90.):
da_data_quantile = xr.DataArray(coords={'lon':da_data.lon,
'lat':da_data.lat,
'month':np.arange(1,13)},
dims = ['month','lat','lon'])
for i in range(1,13):
if i == 1:
mon_range = [12,1,2]
elif i == 12 :
mon_range = [11,12,1]
else:
mon_range = [i-1,i,i+1]
da_data_quantile[i-1,:,:] = (da_data
.where((da_data['time.month'] == mon_range[0])|
(da_data['time.month'] == mon_range[1])|
(da_data['time.month'] == mon_range[2]),drop=True)
.quantile(mhw_threshold*0.01, dim = 'time', skipna = True))
return da_data_quantile
%time da_mon_quantile = mj_3mon_quantile(ds_mon_anom.sst, mhw_threshold=90)
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[12], line 1
----> 1 get_ipython().run_line_magic('time', 'da_mon_quantile = mj_3mon_quantile(ds_mon_anom.sst, mhw_threshold=90)')
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/IPython/core/interactiveshell.py:2488, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
2486 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2487 with self.builtin_trap:
-> 2488 result = fn(*args, **kwargs)
2490 # The code below prevents the output from being displayed
2491 # when using magics with decorator @output_can_be_silenced
2492 # when the last Python token in the expression is a ';'.
2493 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/IPython/core/magics/execution.py:1390, in ExecutionMagics.time(self, line, cell, local_ns)
1388 st = clock2()
1389 try:
-> 1390 exec(code, glob, local_ns)
1391 out = None
1392 # multi-line %%time case
File <timed exec>:1
Cell In[11], line 22, in mj_3mon_quantile(da_data, mhw_threshold)
15 else:
16 mon_range = [i-1,i,i+1]
18 da_data_quantile[i-1,:,:] = (da_data
19 .where((da_data['time.month'] == mon_range[0])|
20 (da_data['time.month'] == mon_range[1])|
21 (da_data['time.month'] == mon_range[2]),drop=True)
---> 22 .quantile(mhw_threshold*0.01, dim = 'time', skipna = True))
24 return da_data_quantile
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/core/dataarray.py:5330, in DataArray.quantile(self, q, dim, method, keep_attrs, skipna, interpolation)
5221 def quantile(
5222 self,
5223 q: ArrayLike,
(...) 5229 interpolation: QuantileMethods | None = None,
5230 ) -> Self:
5231 """Compute the qth quantile of the data along the specified dimension.
5232
5233 Returns the qth quantiles(s) of the array elements.
(...) 5327 The American Statistician, 50(4), pp. 361-365, 1996
5328 """
-> 5330 ds = self._to_temp_dataset().quantile(
5331 q,
5332 dim=dim,
5333 keep_attrs=keep_attrs,
5334 method=method,
5335 skipna=skipna,
5336 interpolation=interpolation,
5337 )
5338 return self._from_temp_dataset(ds)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/core/dataset.py:8158, in Dataset.quantile(self, q, dim, method, numeric_only, keep_attrs, skipna, interpolation)
8152 if reduce_dims or not var.dims:
8153 if name not in self.coords and (
8154 not numeric_only
8155 or np.issubdtype(var.dtype, np.number)
8156 or var.dtype == np.bool_
8157 ):
-> 8158 variables[name] = var.quantile(
8159 q,
8160 dim=reduce_dims,
8161 method=method,
8162 keep_attrs=keep_attrs,
8163 skipna=skipna,
8164 )
8166 else:
8167 variables[name] = var
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/core/variable.py:1949, in Variable.quantile(self, q, dim, method, keep_attrs, skipna, interpolation)
1945 axis = tuple(range(-1, -1 * len(dim) - 1, -1))
1947 kwargs = {"q": q, "axis": axis, "method": method}
-> 1949 result = apply_ufunc(
1950 _wrapper,
1951 self,
1952 input_core_dims=[dim],
1953 exclude_dims=set(dim),
1954 output_core_dims=[["quantile"]],
1955 output_dtypes=[np.float64],
1956 dask_gufunc_kwargs=dict(output_sizes={"quantile": len(q)}),
1957 dask="allowed" if module_available("dask", "2024.11.0") else "parallelized",
1958 kwargs=kwargs,
1959 )
1961 # for backward compatibility
1962 result = result.transpose("quantile", ...)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/computation/apply_ufunc.py:1273, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, on_missing_core_dim, *args)
1271 # feed Variables directly through apply_variable_ufunc
1272 elif any(isinstance(a, Variable) for a in args):
-> 1273 return variables_vfunc(*args)
1274 else:
1275 # feed anything else through apply_array_ufunc
1276 return apply_array_ufunc(func, *args, dask=dask)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/computation/apply_ufunc.py:814, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
809 elif vectorize:
810 func = _vectorize(
811 func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims
812 )
--> 814 result_data = func(*input_data)
816 if signature.num_outputs == 1:
817 result_data = (result_data,)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/core/variable.py:1942, in Variable.quantile.<locals>._wrapper(npa, **kwargs)
1940 def _wrapper(npa, **kwargs):
1941 # move quantile axis to end. required for apply_ufunc
-> 1942 return xp.moveaxis(_quantile_func(npa, **kwargs), 0, -1)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/xarray/core/nputils.py:242, in _create_method.<locals>.f(values, axis, **kwargs)
240 result = np.float64(result)
241 else:
--> 242 result = getattr(npmodule, name)(values, axis=axis, **kwargs)
244 return result
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_nanfunctions_impl.py:1598, in nanquantile(a, q, axis, out, overwrite_input, method, keepdims, weights, interpolation)
1595 if np.any(weights < 0):
1596 raise ValueError("Weights must be non-negative.")
-> 1598 return _nanquantile_unchecked(
1599 a, q, axis, out, overwrite_input, method, keepdims, weights)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_nanfunctions_impl.py:1617, in _nanquantile_unchecked(a, q, axis, out, overwrite_input, method, keepdims, weights)
1615 if a.size == 0:
1616 return np.nanmean(a, axis, out=out, keepdims=keepdims)
-> 1617 return fnb._ureduce(a,
1618 func=_nanquantile_ureduce_func,
1619 q=q,
1620 weights=weights,
1621 keepdims=keepdims,
1622 axis=axis,
1623 out=out,
1624 overwrite_input=overwrite_input,
1625 method=method)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_function_base_impl.py:3912, in _ureduce(a, func, keepdims, **kwargs)
3909 index_out = (0, ) * nd
3910 kwargs['out'] = out[(Ellipsis, ) + index_out]
-> 3912 r = func(a, **kwargs)
3914 if out is not None:
3915 return out
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_nanfunctions_impl.py:1648, in _nanquantile_ureduce_func(a, q, weights, axis, out, overwrite_input, method)
1646 # Note that this code could try to fill in `out` right away
1647 elif weights is None:
-> 1648 result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
1649 overwrite_input, method, weights)
1650 # apply_along_axis fills in collapsed axis with results.
1651 # Move those axes to the beginning to match percentile's
1652 # convention.
1653 if q.ndim != 0:
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_shape_base_impl.py:416, in apply_along_axis(func1d, axis, arr, *args, **kwargs)
414 buff[ind0] = res
415 for ind in inds:
--> 416 buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
418 res = transpose(buff, buff_permute)
419 return conv.wrap(res)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_nanfunctions_impl.py:1696, in _nanquantile_1d(arr1d, q, overwrite_input, method, weights)
1692 if arr1d.size == 0:
1693 # convert to scalar
1694 return np.full(q.shape, np.nan, dtype=arr1d.dtype)[()]
-> 1696 return fnb._quantile_unchecked(
1697 arr1d,
1698 q,
1699 overwrite_input=overwrite_input,
1700 method=method,
1701 weights=weights,
1702 )
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_function_base_impl.py:4567, in _quantile_unchecked(a, q, axis, out, overwrite_input, method, keepdims, weights)
4558 def _quantile_unchecked(a,
4559 q,
4560 axis=None,
(...) 4564 keepdims=False,
4565 weights=None):
4566 """Assumes that q is in [0, 1], and is an ndarray"""
-> 4567 return _ureduce(a,
4568 func=_quantile_ureduce_func,
4569 q=q,
4570 weights=weights,
4571 keepdims=keepdims,
4572 axis=axis,
4573 out=out,
4574 overwrite_input=overwrite_input,
4575 method=method)
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_function_base_impl.py:3912, in _ureduce(a, func, keepdims, **kwargs)
3909 index_out = (0, ) * nd
3910 kwargs['out'] = out[(Ellipsis, ) + index_out]
-> 3912 r = func(a, **kwargs)
3914 if out is not None:
3915 return out
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_function_base_impl.py:4742, in _quantile_ureduce_func(a, q, weights, axis, out, overwrite_input, method)
4740 arr = a.copy()
4741 wgt = weights
-> 4742 result = _quantile(arr,
4743 quantiles=q,
4744 axis=axis,
4745 method=method,
4746 out=out,
4747 weights=wgt)
4748 return result
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/lib/_function_base_impl.py:4961, in _quantile(arr, quantiles, axis, method, out, weights)
4958 if result.shape == () and result.dtype == np.dtype("O"):
4959 result = result.item()
-> 4961 if np.any(slices_having_nans):
4962 if result.ndim == 0 and out is None:
4963 # can't write to a scalar, but indexing will be correct
4964 result = arr[-1]
File ~/micromamba/envs/marine-heatwave-cookbook-dev/lib/python3.13/site-packages/numpy/_core/fromnumeric.py:2476, in any(a, axis, out, keepdims, where)
2471 def _any_dispatcher(a, axis=None, out=None, keepdims=None, *,
2472 where=np._NoValue):
2473 return (a, where, out)
-> 2476 @array_function_dispatch(_any_dispatcher)
2477 def any(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue):
2478 """
2479 Test whether any array element along a given axis evaluates to True.
2480
(...) 2577
2578 """
2579 return _wrapreduction_any_all(a, np.logical_or, 'any', axis, out,
2580 keepdims=keepdims, where=where)
KeyboardInterrupt:
Tip
The `%time` command is jupyter cell magic to time the one-liner cell operation. It provides a great way to find the bottleneck of your data processing steps.The determined threshold value of each grid of each month is shown below
da_mon_quantile.isel(month=0).plot(vmin=0,vmax=3)
<matplotlib.collections.QuadMesh at 0x7fd2ec2270d0>
Identify the marine heatwaves based on threshold#
The figure below shows the original SST anomaly value for the first month.
ds_mon_anom.sst.isel(time=0).plot(vmin=0,vmax=3)
<matplotlib.collections.QuadMesh at 0x7fd2ec12b010>
To identify the marine heatwaves based on the monthly threshold, we use the where method to find the monthly marine heatwaves with the grid that has SST anomaly below the threshold to be masked as Not-a-Number.
da_mhw = ds_mon_anom.sst.where(ds_mon_anom.sst.groupby('time.month')>da_mon_quantile)
The figure below shows the SST anomalous values that are above the monthly thresholds for the first months.
da_mhw.isel(time=0).plot(vmin=0,vmax=3)
<matplotlib.collections.QuadMesh at 0x7fd2da7e9250>
Interactive plot#
The interactive plot is a great tool for looking at a local changes through zoom in.
Plotly provides a great interface for the user to also pin point the actual value at the point where they are interested in.
The only thing that need further data manipulation for using the plotly tool is to convert the Xarray DataArray to Pandas DataFrame.
However, this can be easily achieved throught the method .to_dataframe() provided by the Xarray package.
dff = (da_mhw.isel(time=0)
.to_dataframe()
.reset_index()
.dropna()
)
dff = dff.rename(columns={'sst':'MHW magnitude'})
Plotly setting#
After the DataFrame is created the plotly map can be created. Here, we are only using some simple options. More detail setups and options can be find on the Plotly documentation
# Setup the scatter mapbox detail
center = {'lat':38,'lon':-94} # center of the map
zoom = 2 # zoom level of the map
marker_size = 8 # marker size used on the map
mapbox_style = 'carto-positron' # mapbox options
fig = px.scatter_mapbox(dff,
lon = 'lon',
lat = 'lat',
color = 'MHW magnitude',
color_continuous_scale = 'orrd'
)
fig.update_layout(
mapbox={
'style': mapbox_style,
'center': center,
'zoom': zoom,
}
)
# Update the marker size using update_traces
fig.update_traces(marker=dict(size=marker_size))
Summary#
Through this example, we demostrate how to lazily loaded a real world SST data from a OPeNDAP server and calculate the threshold that help us define the marine heatwave. By using the threshold, we can find the marine heatwave in each month.
What’s next?#
A more interactive figures to view the marine heatwave will be added.